Exploiting Partial Replication in Unbalanced Parallel Loop Scheduling on Multicomputers

نویسنده

  • Salvatore Orlando
چکیده

We consider the problem of scheduling parallel loops whose iterations operate on large array data structures and are characterized by highly varying execution times (unbalanced or non-uniform parallel loops). A general parallel loop implementation template for message-passing distributed-memory multiprocessors (multicomputers) is presented. Assuming that it is impossible to statically determine the distribution of the computational load on the data accessed, the template exploits a hybrid scheduling strategy. The data are partially replicated on the processor's local memories and iterations are statically scheduled until rst load imbalances are detected. At this point an eeective dynamic scheduling technique is adopted to move iterations among nodes holding the same data. Most of the communications needed to implement dynamic load balancing are overlapped with computations, as a very eeective prefetching policy is adopted. The template scales very well, since knowing where data are replicated makes it possible to balance the load without introducing high overheads. In the paper a formal characterization of load imbalance related to a generic problem instance is also proposed. This characterization is used to derive an analytical cost model for the template, and in particular, to tune those parameters of the template that depend on the costs related to the speciic features of the target machine and the speciic problem. The template and the related cost model are validated by experiments conducted on a 128-node nCUBE 2, whose results are reported and discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting partial replication in unbalanced parallel loop scheduling on multicomputer

We consider the problem of scheduling parallel loops whose iterations operate on large array data structures and are characterized by highly varying execution times unbalanced or non uniform parallel loops A general parallel loop implementation template for message passing distributed memory multiprocessors multicomputers is presented Assuming that it is impossible to statically determine the d...

متن کامل

Chain-Based Scheduling: Part I { Loop Transformations and Code Generation

Chain-based scheduling [1] is an e cient partitioning and scheduling scheme for nested loops on distributed-memory multicomputers. The idea is to take advantage of the regular data dependence structure of a nested loop to overlap and pipeline the communication and computation. Most partitioning and scheduling algorithms proposed for nested loops on multicomputers [1,2,3] are graph algorithms on...

متن کامل

Reducing Data Communication Overhead for Doacross Loop Nests Reducing Data Communication Overhead for Doacross Loop Nests

If the loop iterations of a loop nest cannot be partitioned into independent sets, the data communication for data dependences are inevitable in order to execute them on parallel machines. This kind of loop nests are referred to as Doacross loop nests. This paper is concerned with compiler algorithms for parallelizing Doacross loop nests for distributed-memory multicomputers. We present a metho...

متن کامل

Chain-based Scheduling: Part I { Loop Transformations and Code Generation Chain-based Scheduling: Part I { Loop Transformations and Code Generation

Chain-based scheduling 1] is an eecient partitioning and scheduling scheme for nested loops on distributed-memory multicomputers. The idea is to take advantage of the regular data dependence structure of a nested loop to overlap and pipeline the communication and computation. Most partitioning and scheduling algorithms proposed for nested loops on multicomputers 1,2,3] are graph algorithms on t...

متن کامل

Probabilistic Analysis of Scheduling Precedence Constrained Parallel Tasks on Multicomputers with Contiguous Processor Allocation

ÐGiven a set of precedence constrained parallel tasks with their processor requirements and execution times, the problem of scheduling precedence constrained parallel tasks on multicomputers with contiguous processor allocation is to find a nonpreemptive schedule of the tasks on a multicomputer such that the schedule length is minimized. This scheduling problem is substantially more difficult t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995